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ABSTRACT 

Energetic feedback processes during the formation of galaxy clusters may have heated and ionized a large 
fraction of the intergalactic gas in proto-cluster regions. When such a highly ionized hot "super-bubble" falls 
along the sightline to a background quasar, it would be seen as a large void, with little or no absorption, in the 
Lyman-a forest. We examine the spectra of 137 quasars in the Sloan Digital Sky Survey, to search for such 
voids, and find no clear evidence of their existence. The size distribution of voids in the range SA < AA <, 70A 
(corresponding to physical sizes of 3/;"' ^ R ^ 35/i~' comoving Mpc) is consistent with the standard model for 
the Lyman a forest without additional hot bubbles. We adapt a physical model for HII bubble growth during 
cosmological reionization (Furlanetto, Zaldarriaga & Hernquist 2004), to describe the expected size-distribution 
of hot super-bubbles at z ^ 3. This model incorporates the conjoining of bubbles around individual neighboring 
galaxies. Using the non-detection of voids, we find that models in which the volume filling factor of hot bubbles 
exceeds ^ 20 percent at z ~ 3 can be ruled out, primarily because they overproduce the number of large (40-50A) 
voids. We conclude that any pre-heating mechanism that explains galaxy cluster observations must avoid heating 
the low-density gas in the proto-cluster regions, either by operating relatively recently (z ^ 3) or by depositing 
entropy in the high-density regions. 

Subject headings: methods: data analysis - large-scale structure of the universe - intergalactic medium - 
quasars: absorption lines 
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1. INTRODUCTION 

Observations of galaxy clusters suggest that feedback played 
an important role during their formation. The simplest mod- 
els for galaxy clusters neglect feedback and assume the gravita- 
tional collapse of a dark matter halo, accompanied by gas infall. 
These models fail to reproduce either the observed scaling re- 
lations between bulk characteristics, or the structural properties 
of individual clusters (e.g. Bialek et al. 2001; Voit et al. 2002). 
For example, the self-similar gas distribution expected in this 
model predicts the relation between X-ray luminosity and tem- 
perature, Lx oc (Kaiser 1986), whereas observations find a 
steeper relation, closer to L cx (e.g., David et al. 1993; 
Arnaud & Evrard 1999; Helsdon & Ponman 2000). A com- 
pelling suggestion to explain the discrepancy is that the intra- 
cluster gas has been pre-heated, i.e., raised to a higher adiabat, 
at an early stage in the formation of the cluster The result- 
ing so-called "entropy floor" would then preferentially affect 
low-mass clusters. Indeed, the observed Lx-T and related 
scaling relations are well reproduced in models that simply en- 
dow the gas by an extra entropy, of order ~ 100 keVcm"^, be- 
fore its collapse (Voit & Bryan 2001, Bialek et al. 2001; Voit, 
Bryan, Balogh & Bower 2002; McCarthy et al. 2003a,b). The 
physical mechanism responsible for the pre-heating could be 
supernova-driven galactic winds, or the radiation output of ac- 
tive galactic nuclei (AGN). 

The simplest form of this pre-heating model does not ap- 
pear to provide an acceptable fit to the detailed cluster profiles 
(Pratt et al. 2005, 2006; Younger & Bryan 2007). Neverthe- 
less, the broader idea, namely that energetic processes strongly 
influenced at least parts of the intergalactic medium (IGM), cor- 
responding to proto-cluster regions, at early times, remains vi- 
able. Indeed, there is considerable empirical support for this 



broader picture. The global star-formation rate, inferred from 
galaxies discovered at redshift z ~ 3, such as the so-called Lyman- 
break galaxies (LBGs), appears significantly higher than the 
star formation rate in the local universe. Energetic "superwinds" 
from LBGs at z ^ 3 have been inferred directly from their UV 
spectra, showing several-hundred km/s offsets between stellar 
and interstellar lines (Heckman et al. 2000; Pettini et al 2001). 
Similar winds are known to accompany nearby star-bursts (e.g. 
Heckman et al. 1990) and such winds would be natural candi- 
dates for large-scale feedback at earlier times. 

Indeed, various studies have suggested that winds from galax- 
ies can affect not only the galaxy itself, but also the surround- 
ing IGM out to a distance approaching ^ 1 (comoving) 
Mpc, which may affect global Lyman a absorption statistics 
(e.g. Fang et al. 2005; Kollmeier et al. 2006; Desjacques et 
al. 2006). Recent works have focused on interpreting obser- 
vations of Lyman a absorption statistics in quasar spectra with 
sightlines passing near LBGs. The observations possibly indi- 
cate a reduced level of absorption within ~ l/z"' Mpc of LBGs 
(Adelberger et al. 2003; 2005), which may be attributable to 
the impact of these galaxies on the ambient IGM (but see Des- 
jacques et al. 2006). In the context of the LBGs, several groups 
have used numerical simulations to study the metal-enrichment 
of the IGM by galactic outflows, and the corresponding impact 
on the global Lyman a absorption statistics (Theuns et al. 2002; 
BruscoU et al. 2003; McDonald et al. 2005). 

The present study is motivated by the related suggestion of 
Theuns et al. (2001), that the preheating of large proto-cluster 
regions may leave a direct imprint on the global Lyman a forest 
absorption statistics. Irrespective of the physical mechanism, 
the pre-heating would likely ionize the hydrogen in the proto- 
cluster region, and the resulting hot bubble would be optically 
thin in Lyman line absorption. Theuns et al. (2001) proposed 
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that if the suggested entropy level does exist, the highly ionized 
proto-c luster regions could produce large voids: stretches of 
wavelength as long as ^ 20A with little or no absorption. 

Such hot proto-cluster bubbles could be an order of magni- 
tude larger (in linear size) than the ^ l/z~' Mpc ionized bub- 
bles that may envelope individual LBGs. These large proto- 
cluster bubbles may, of course, correspond to a clustered group 
of bubbles around LBGs. On the other hand, they could be pro- 
duced by the collective effect of a group of galaxies that are 
much smaller and/or formed earlier than the known population 
of LBGs. In this case, the large voids may be more readily 
identified in studying the global Lyman a forest statistics. 

Historically, a few authors have searched for large voids in 
the Lyman-a forest (Atwood et al. 1985; Crotts 1987; Ostriker, 
Bajtlik & Duncan 1988; Duncan, Ostriker & BajtHk 1989; Do- 
brzy cki & Bechtold 1991). In the standard model for the Ly man- 
a forest, the absorption lines are produced by fluctuations in 
the density field. Observed statistics, such as the column den- 
sity distribution and evolution, and the spatial distribution of the 
absorbers, are consistent with a model in which the gas traces 
the primordial dark-matter fluctuations, and is kept photoion- 
ized by a uniform metagalactic radiation (Miralda-Escude et al. 
1996; Hui & Gnedin 1997). On large scales ( ^ lOr' Mpc), 
the Lyman-a absorbers are essentially randomly distributed in 
space, and their incidence rate statistics in quasar spectra are de- 
scribed by a Poisson distribution. The studies listed above have 
identified only a handful of candidates for large voids that were 
discrepant with a Poisson distribution (and not associated with 
the proximity effect of the background quasar itself), but none 
of these have been confirmed at high statistical significance. 

In this paper, we perform a new search for large voids in the 
Lyman-a forest. Our analysis differs from existing studies in 
two important ways. First, we use a larger sample of quasar 
spectra, available from the Sloan Digital Sky Survey (SDSS). 
Second, while we adopt the same null-hypothesis as previous 
works (i.e. a Poisson distribution for the absorbers), we use a 
new physical model for the bubble distribution that tracks the 
conjoining of bubbles around individual galaxies. These "merg- 
ers" between bubbles are important when their volume-filling 
factor rises above a few percent: galaxies are clustered in space, 
and a single large void will typically contain many galaxies. 
As a result, mergers are a way to produce larger, possibly de- 
tectable voids. 

The rest of this paper is organized as follows. In § |2] we 
explain how we model the mass function of highly ionized re- 
gions. In §[3] we briefly describe the observational data that we 
used. In §21 we introduce our approach of statistically compar- 
ing the predicted and observed void distribution in Lyman-a 
forest. In § |5] we present our main results. In § |6] we dis- 
cuss the limitations of our approach, as well as possible future 
improvements. In § |71 we briefly summarize our conclusions 
and the implications of this work. Throughout this paper, we 
adopt a spatially flat universe dominated by a cosmological 
constant and cold dark matter (CDM), with the following set 
of cosmological parameters: fl,,, = 0.3, Ha = 0.7, erg = 0.9 and 
Hq = 70 km s"' Mpc"'. These values are consistent with mea- 
surements by the WMAP experiment (Spergel et al. 2003; 2007; 
we include a discussion of the sensitivity of our results to the 
choice of erg below). 



2. MODELING THE DISTRIBUTION OF THE HIGHLY IONIZED 

REGIONS 

Our main task is to model the abundance and size distribu- 
tion of highly ionized proto-cluster regions. In the back-of-the 
envelope style estimate in Theuns et al. (2001), a proto-cluster 
that later develops into a cluster of mass M was treated as a 
uniform sphere containing the same amount M of fully ionized 
gas at some fixed overdensity 6 relative to the cosmic mean gas 
density at redshift z- Here z and S are both free parameters, the 
relevant values of which would need to be estimated from some 
further modeling. Assuming that pre-heating operates at red- 
shift z ^ 3 and that the mean overdensity in the proto-cluster 
region is (5 ^ 1, they calculated the proto-cluster size distribu- 
tion from the known mass function of galaxy clusters. They 
concluded that the typical size of a void at this redshift should 
be a fewx lOA, which would appear prominent in high-redshift 
(z > 3) quasar spectra. Based on the local abundance of mas- 
sive clusters, they estimated that there should be approximately 
one such void per unit redshift. 

An obvious refinement of this simple estimate is to make 
a connection between redshift and overdensity by using the 
spherical collapse model. Given the current overdensity of clus- 
ters and their collapse redshift, the cluster's expected overden- 
sity at some higher redshift can be obtained directly. This ap- 
proach would eliminate one free parameter, but would still miss 
an important ingredient of cluster formation: mergers. In the hi- 
erarchical structure formation scenario, proto-clusters are more 
likely to be made up of many smaller clumps that would later 
merge together Star formation and AGN activity in these clumps 
(which could represent an individual galaxy, or a small group of 
highly clustered galaxies) could then ionize gas in their vicin- 
ity. Hereafter we will refer to the area ionized around a sin- 
gle progenitor clump as a "hot bubble". Several hot bubbles, 
initially generated independently in different collapsed regions, 
may overlap with each other, and form a larger "super-bubble". 
In order to compute the distribution of voids in the Lyman-a 
forest, we first need to get the mass function of these super- 
bubbles. 

Furlanetto et al. (2004) have studied an analogous problem 
for the mergers of ionized bubbles at higher redshifts, in the 
context of cosmological reionization. Here we adopt their for- 
malism, and apply it to the super-bubbles at lower redshift. In 
the context of reionization, the formalism has been compared 
to numerical simulations of the growth of ionized bubbles, and 
was found to accurately reproduce the size-distribution and large- 
scale clustering properties of ionized bubbles (Zahn et al. 2007). 
We caution, however, that a similar test against simulations has 
not yet been performed at lower redshifts (see discussion be- 
low). The formalism is based on the simple assumption that 
a collapsed DM halo can ionize a region whose mass is pro- 
portional to the halo's own mass. The effective proportionality 
coefficient, denoted as 

mion = Oncol (1) 
depends, in the context of reionization, on the efficiency of ion- 
izing photon production, escape fraction of these photons from 
the host galaxy, the star-formation efficiency, and the mean 
number of recombinations. In our case, we define an analo- 
gous coefficient between the mass of a bubble and the mass of 
a halo, 

mhubble = C'^col ■ (2) 

The value of this coefficient should depend on the velocity, tem- 
perature, and typical age of galactic winds, or, alternatively, on 
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similar parameters for the typical AGN outflows. However, this 
simplified description could plausibly describe other scenarios, 
as well (e.g. a simple photo-ionization proximity effect). 

The condition C > 1 has to be satisfied in order for the winds 
to propagate outside the DM halos and generate hot bubbles 
in the IGM. In this case, there is a chance that different bub- 
bles can overlap and unite into a larger super-bubble. The 
statistics of the super-bubble size distribution is, in general, 
then driven by this overlap, which, in turn, is governed by the 
large-scale density fluctuations. In order to avoid modeling the 
complex process of overlap, Furlanetto et al. propose to utilize 
the following relation, which must be satisfied for every super- 
bubble: 



fcoll ^ C 



(3) 



where fcoii is the collapsed fraction (the ratio of mass residing 
in collapsed halos to the total mass inside the super-bubble), 
and is determined using the extended Press-Schechter model. 
In this approach, /„,// depends on the mean linear overdensity 
S,„ inside the super-bubble. The excursion-set formalism can 
be used to find the largest region surrounding an arbitrary point 
in space, where the above relation is satisfied. The final result 
for the mass function of super-bubbles is 
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where a^{m) is the variance of density fluctuations on the scale 
of mass m, and B is the critical overdensity. If the mean den- 
sity within a region of mass m is higher than B, then it is ion- 
ized; B() is the limiting value of B as m ^ co. The expres- 
sion is analogous to the Press-Schechter mass function, except 
that the value of the critical overdensity is different and mass- 
dependent. This formalism requires us to specify a parame- 
ter M,„,„, which is the mass of the smallest collapsed halo that 
can produce winds, or a hot bubble. Our fiducial value of M„„„ 
throughout this paper is set to be 10" Mq , but we also consider 
a range of values M„„„ = 10*^, 10'" or 10'^ . The choice for 
the lowest value is motivated by the expectation that the cooling 
and collapse of gas, and therefore star-formation in smaller ha- 
los is prevented by the UV background (Efstathiou 1992; Thoul 
& Weinberg 1996; Dijkstra et al. 2004), whereas the highest 
value corresponds roughly to the largest masses considered for 
LBGs at z - 3 (Somervifle et al. 2002). 

We assume that the temperature in these hot bubbles is suffi- 
ciently high ( ^ 5 X lO'^K) for hydrogen to be essentially com- 
pletely ionized, and that these regions therefore produce negli- 
gible Lyman a absorption in the spectra of background quasars. 
The signature of such a hot bubble intersecting a quasar sight- 
line would therefore be a "void" in the Lyman-a forest. We 
are now in the position to compute the size distribution of these 
voids; the results will be explicitly calculated and shown in §|4] 
below. 

3. OBSERVATIONAL DATA 

In this section, we briefly describe the data we used for our 
analysis. The spectra were selected from the SDSS Data Re- 
lease 4 (DR4; Adelman-McCarthy et al. 2006). We exam- 
ined 137 quasar spectra with redshifts in the range 3.5 < z < 4. 
We cherry-picked these high-quality spectra by hand from 798 
among the brightest quasars in DR4 in this redshift range. Spec- 
tra were selected so that the S/N is greater than 8, and the wave- 
length resolution is about one Angstrom per pixel. From every 
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Fig. 1. — The Lyman-o absorption spectrum of a typical source used in 
our analysis. The two vertical dashed hnes mark the range of the spectrum we 
utilized. This range was chosen to avoid wavelengths that contain Lyman-/3 
absorption, and the proximity region within 10 Mpc of the quasar. 



spectrum, we only used the Lyman-a region from 1025(1 +z) A 
to 1215(1 +z)A, discarding shorter wavelengths subject to ad- 
ditional Lyman (3 absorption. In order to avoid having to model 
the proximity effect, we also excised the wavelength range cor- 
responding to radial separations of < lOMpc from the source 
quasar The total wavelength range we analyzed is 112203 A, 
which is equivalent to an effective redshift range of Az = 92.3. 
The median redshift of the wavelength range we utilized is 
z= 3.3. For illustration, in Figure [T] we show the spectrum of a 
typical source. 

The raw data include the flux fj and the noise «, for each 
~ 1 A wide wavelength bin centered at A,-. We first fit the con- 
tinuum for every spectrum, using the same clipped-variance 
estimator continuum technique employed for the SDSS absorp- 
tion line catalog (e.g., York et al. 2005). We then search the 
Lyman-a part of the spectrum for voids larger than 5 A. We ne- 
glected smaller voids because of the limitation from spectrum 
resolution, and in order to avoid small-scale correlations be- 
tween pixels induced by large-scale structures (the size of the 
smallest void we consider, ^ 3/i~'Mpc, exceeds the correlation 
length of the absorption lines by a factor of wlO; e.g. Cris- 
tiani et al. 1997). Here we define a void to be a contiguous 
range of neighboring pixels where the flux-to-continuum ratio 
exceeds a certain threshold. The threshold could, in principle, 
be very high ( >99%) from the simple theoretical speculation 
above. However, noise in the data limits our choice for the 
threshold to be smaller than or equal to 80 percent (see a more 
detailed discussion in §|4] below). This, means that, in effect, 
we allow hot bubbles to contain some residual neutral hydrogen 
causing ^ 10% absorption. We create a size distribution of the 
voids, i.e. a histogram using discrete wavelength bins of width 
in the range 5A< AA < 70A in increments of 5A. We gener- 
ated mock histograms using different fitting functions, derived 
from models with or without hot bubbles (see §|4]i. The good- 
ness of fit and the likelihood of each model is obtained from the 
usual statistic, 

— {Ni-mf 



X = 



(5) 



where A^,- and n, are the number of observed vs predicted voids 
in bin respectively. 
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4. STATISTICAL ANALYSIS 

In §|2] we explained how we calculated the bubble size distri- 
bution. In this section, we will convert this bubble size distribu- 
tion into the Lyman-O! void distribution that could be compared 
to the observational data. In general, voids in the observed ab- 
sorption spectrum could be produced in two ways. First, in the 
usual picture for the undisturbed Lyman a forest, due to the 
density fluctuations, the IGM will contain low density regions 
that produce little absorption. Second, the presence of hot bub- 
bles can produce additional voids, as we explained above. For 
clarity, we refer to these two kinds of voids as "density voids" 
and "ionization voids", respectively. In the first half of this sec- 
tion, we discuss the void-size distribution, including the non- 
trivial overlap between individual voids, ignoring, for simplic- 
ity, the presence of noise. In the second half of this section, we 
discuss the impact of noise on our predicted void-size distribu- 
tions. 

4. 1 . The Expected Noise-Free Void-Size Distribution 

The total number of voids is not simply the sum of density 
voids and ionization voids, since these voids can mix in a non- 
trivial way. For example, a hot bubble may be expanding into 
a low-density region in the IGM, producing an ionization void 
that connects with an adjacent density void, forming a single, 
larger apparent void. Bearing this in mind, we first calculate the 
size distribution of ionization voids. We neglect peculiar veloc- 
ities in our analysis. Typical peculiar velocities at z w 3 on the 
relevant large scales are <, 100 km/s (e.g. Gnedin & Hamilton 
2002), which would correspond to '--^ 1 .6A shifts in the appar- 
ent spectrum. This is a small fraction of the smallest void size 
we consider. Furthermore, peculiar velocities are proportional 
to the overdensity and will be smaller for spectral pixels of in- 
terest that have lower-than-usual absorption. In this case, the 
size of an ionization void is determined solely by the Hubble 
flow, which in turn scales directly with the size of the hot bub- 
bles. For simplicity, and consistent with the model assumptions 
in the previous section, we further assume that the bubbles are 
spherical. For a given mass m, the volume of a bubble is then 
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where p is the mean density of the universe, and (5„, is the over- 
density within the bubble. By construction, in the hot bubbles, 
5m is equal to B in equation Let as assume that the line 
of sight (LOS) intersects a bubble of radius R = R{m) at an im- 
pact parameter < b < R (defined as the distance of closest 
approach between the center of the hot bubble and the LOS). 
The length of LOS within the hot bubble is then I = l^R^-b^, 
and the Hubble velocity across this region is 

vh = H(z)l = 2H(zWR^-b^. (7) 
The hot region produces a void covering the range of observed 
wavelengths 

AA = A„(l +z)— = K^R^-b^, (8) 
c 

where Aq = 12 15 A is the rest-frame Lyman-a wavelength, c 
is the speed of light, and in the last step we have defined K = 
2Aq.(1 +z)c"'//(z). The number density of voids of size AA 
along a given LOS per unit redshift and unit size is obtained 
directly from the mass function through the equation 
d^A^ 



dAAdz 



(z,AA)= / dm db2TrbS(AX-Ky/R2-b^): 

Jm,,,;„ Jo 



dn c(l+z) 
dl^i H(z) 



In this equation, dn /dm is the mass function of hot bubbles, 
nimin = '«min(AA) is the smallest hot bubble that can produce a 
void of length AA (at b = 0), d'V /dzdVl = c//-'(z)(l +zfdl{z) 
is the comoving volume per unit redshift and solid angle, and 
Inbdbd^iz) is the solid angle extended by a narrow circular an- 
nulus at impact parameter b and width db. Note that the angular 
diameter distance, dAiz), drops out of the equation. The Dirac 
delta function in the top row enforces the relation between bub- 
ble radius R and impact parameter b to produce a void of fixed 
length AA. Performing the /j-integral (using the property of the 
delta function 5{f(_x)) = S{x)/\f'\), we find 



d^N , , TTc^ AA 
-(z,AA) = 



dm- 



dn 



(10) 



dAAdz'~' "' l\\H^{z)]m^^ dm 

To calculate the overall distribution of voids, we need to take 
into account both density voids and ionization voids, and the 
list of possible overlaps between them. If there were only den- 
sity voids, their distribution would obey a simple exponential. 
This follows directly from the Poisson distribution of absorp- 
tion lines (e.g. Crotts 1987; Ostriker et al. 1988) and assumes 
that the spatial correlations between different absorption lines 
on the large scales of interest ( ^ 10 times the correlation length 
of absorption lines; e.g. Cristiani et al. 1997) are negligible. 
The number of pure density voids of size AA, per unit observed 
total wavelength range AA,of , and per unit void size AA, is then 
given by 

d^A^o 



= Aexp(-Z7AA) 



(11) 



dAAdAA,„, 

Here b is related to the mean number of absorption lines per 
unit wavelength, above a given threshold strength, weighted 
appropriately over redshift (see below). A is a normalization 
factor which also depends on the number density of absorp- 
tion lines and also, in general on AA,of. Note that A has units 
of (wavelength)"^. The simplest case is that of a single red- 
shift, negligible noise, and absorption lines that do not blend 
together (we will discuss the issue of noise further below). In 
this case, one can compute the total number of expected voids, 
and uniquely compute the normalization A as a function of b 
and AA,o,; A =A{b^ AA,o,). Taking the limiting case of AA,or 
oo, and the width of individual absorption lines approaching 
zero, we find A b^. This unique correspondance, however, 
is spoiled when the above assumptions are relaxed. In order 
to avoid addressing these issues, or having to model the mean 
transmission of the forest and its evolution with redshift self- 
consistently, we treat A as an independent free parameter in our 
fitting procedure. It is worth noting, however, that further mod- 
eling could significantly tighten the constraints we derive below 
on the abundance of hot bubbles. In practice, we find deviations 
of up to a factor of 2 from the above limiting formula A = b^, 
implying that the above simplifying assumptions do not intro- 
duce gross errors. 

The presence of ionization voids will disrupt the exponen- 
tial distribution of the density voids. Let us first consider an 
observed void of size AA that contains exactly one ionization 
void of some size s < AA, overlapping with zero, one, or two 
neighboring density voids whose sizes add up to AA - s. The 
number of such voids, per unit observed total wavelength range 
AA,o,, and per unit void size A A, is given at some redshift z by 



d^N, 



dAAdAA,, 



(9) 
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AX 



ds- 



d^N 



-is) 



;o dAAdz 
X exp[-KAA-s)](AA-s), 



(12) 
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The equation follows from noting that (i) the center of the ion- 
ization void of size s can be placed anywhere over an interval 
of length (AA-i), and (ii) none of the pixels in the remain- 
ing length (AA-i) that are not covered by the ionization void 
should have an absorption line. Note that in the limit of ^ oo 
and A b^, density voids will be rare, and equation ( fT2b in- 
deed reduces to the abundance of ionization voids (eq. [TOl l. as 
it should. 

Similarly, the observed void of size AA could contain two 
ionization voids, of sizes (s,m) < AA, connecting with neigh- 
boring density voids. The number of these cases is given by 
an argument analogous to the previous case, except we need 
to enforce the condition that the two ionization bubbles are, by 
definition, disjoint. This can be achieved by the following pro- 
cedure: (i) choose a size < i < AA for the first ionization 
void, (ii) then choose a location for this void, measured by the 
distance s <t < AA of the right "edge" of the s void from the 
left "edge" of the larger AA void, and (iii) finally place a sec- 
ond ionization void, of size < m < (AA-f) anywhere in the 
remaining interval (AA- M-f)- We find, accordingly, 

rAA ,.AA /.AA-r a2ai 



dAAdAA,, 



AlX ''I. "I '^"dAAdi 
X exp[-/7(AA-s-M)](AA-M-f) 



-(«) 



dAAdz' 
(13) 



The total number of voids of size AA is then given by 
d^A^ d^A^o d^A^i d^N2 



dAAdAA,o, dAAdAA,o, dAAdAA,, 
« = no + ni +"2 + 
where we introduced n = d^N /dAXdA\,„ 



, dAAdAAro, 

(14) 

to simplify notation. 



In the rest of this paper, we omit terms in the above formula 
above "second order" (the two ionization void case); we will 
discuss the effect of higher order terms in §|6]below. 

As a first check, to see whether hot bubbles have a conspic- 
uous effect on the Lyman-a spectrum, we fit the observed his- 
togram for n(AA) using the exponential function no alone. The 
exponential fit turns out to be adequate, immediately revealing 
that we cannot rule out the null hypothesis that the data con- 
tains no hot bubbles. Then we used our model formula in equa- 
tion (fT4l i to fit the data. For a given we adjusted our free 
parameters A and b to minimize and obtain the maximum 
probability. We start with C = 1 and increase ( gradually in 
increments of 0.01 until the fit breaks down, i.e. until the max- 
imum probability is smaller than a threshold value. We chose 
the fiducial value of 10"^ for this threshold probability in our 
analysis (see discussion below). 

The procedure outlined above yields an upper limit of C that 
corresponds to the threshold probability. Equivalently, we can 
convert C to a corresponding upper limit on the global volume- 
filling factor Q of the hot bubbles, using the equation 



4.2. 



dn 



CM,,, 



dm p(l + B(m,z)) 



-dm. 



(15) 



The Impact of Spectral Noise 

Before describing our results, we discuss the choice of our 
absorption threshold for defining a void in the data, and the im- 
pact of noise in the spectra. The average S/N in the data we 
utilize is S/N=9.94, which, essentially, forced us to select a low 
threshold in our definition of a void. Note that the relatively 
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Fig. 2. — The figure illustrates the impact of noise on the void-size dis- 
tribution. We consider a single true void whose size is 40 pixels, with no 
true absorption in any of these pixels. Fake absorption spikes, due to Gaus- 
sian noise, are then added. The noise spikes sub-divide the 40-pixel void into 
many smaller sub-voids, producing a distribution of void-sizes that depends 
on the adopted thi'eshold for defining a void. The four panels show the result- 
ing sub-void size distributions, with four different choices for the thi'eshold. 
The choices coiTespond to decrements below the continuum set at 2, 2.5, 3, 
and 4 times the noise. In the latter two cases (bottom two panels) the distri- 
bution approaches a delta function at the 40-pixel size, showing that noise has 
little effect when the threshold is smaller than ~ 2.5a- (or about 75% of the 
continuum level for the S/N f» 10 spectra we used in our analysis). 



lenient observational threshold means that some residual ab- 
sorption is allowed to take place in the hot bubbles. In our sim- 
ple treatment, Lyman-a forest absorption lines are randomly 
distributed, yielding the exponential distribution of the density 
voids. However, spectral noise tends to produce some addi- 
tional, fake absorption lines, which will break up true voids. 
Fortunately, noise at different pixels can be treated as uncor- 
rected over scales of more than a few pixels. If S/N were a 
constant over the whole wavelength range, the exponential void 
distribution would remain valid - the effect of the noise could 
be absorbed in the uninteresting (for us) constants A and b. 

Unfortunately, in practice, the S/N is usually larger in pix- 
els where the flux is larger, which, in general, would neces- 
sitate further modeling. However, this complication can be 
avoided by choosing a sufficiently low threshold in defining 
a void, such that the number of fake absorption lines in the 
wavelength range we analyze is small. For a rough illustra- 
tion, let us assume that the noise is Gaussian, with la values 
corresponding to w 10% of the unabsorbed flux. Since we use 
« lO'' independent pixels in our analysis, we expect roughly 
2250, 600, 135, 25, and 3 pixels with absorption lines extend- 
ing below thresholds of 80, 75, 70, 65, and 60 percent of the 
continuum. As we shall find below, there are 3632, 4064, 4345, 
4573, and 4667 voids in these cases, occupying a total number 
of 37545, 44900, 51558, 58303, and 64833 pixels. Therefore, 
the fractional increase in the number of voids in these cases, 
assuming each fake absorption line results in one extra void, is 
(2250/112203)(37545/3632)=0.21, etc. Clearly, this fractional 
increase is small for thresholds below 70 percent. 

Next, let us consider including the effect of noise explicitly 
in our model fitting function (fT4l i. Rather than modeling the 
noise in the data, we added noise into our theoretical void dis- 
tribution, before comparing it with the data. The effect of un- 
corrected constant S/N noise on the density void distribution no 
is automatically absorbed into the free parameters A and b (i.e. 
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the noise changes only the values of A and b, which are anyway 
free parameters in our analysis). Including the effect of noise 
on the ionization void distribution is more subtle. First, we as- 
sume that there is no absorption in the ionization voids, and the 
effect of a noise absorption-spike is to cut a large ionization 
void into two smaller ones. This amounts to a re-distribution 
of g^^. Let us consider an ionization void containing x pixels 

(since each pixel is ~ 1 A, this also roughly gives the size of the 
void in A). The probability that a void of size y appears, as 
a result of noise sub-dividing the true void of size x (with x and 
y regarded as the integer number of pixels) is 

piy\^)=T.('')p'c^-py~' 

i=Q ^'^ 



(1 + (16) 



1 

y J \\ + iJ + 
Here, p is the probability that the noise in a given single pixel 
lowers the flux down below the threshold. For instance, if the 
threshold is chosen at 90% of the continuum, or Icr, then we 
have p = 0.16. The first line in the summation on the right 
hand side of equation ( fTSI l. (J) p'{l—pY~', gives the probability 
that exactly / pixels, selected randomly from among x pixels, 
are lowered below the threshold. These / pixels then divide the 
whole void into /+ 1 smaller sub-voids (allowing the length of a 
sub-void to be zero, in cases when noise spikes occupy neigh- 
boring pixels). The term ('*~')(-[q:;)'(-|^f; )'"'"' gives the proba- 
bility that a given sub-void's length is exactly y. Finally, we 
multiply this last factor by the number of sub-voids, (/+ 1), and 
sum over all the possible /'s. 

As an illustration of the impact of noise, in Figure |2] we con- 
sider a 40-pixel void, which is allowed to be sub-divided by 
noise. The four different panels correspond to different choices 
for the threshold to define a void. In the absence of noise, 
we would have a single 40-pixel void in each panel. The ef- 
fect of noise is to produce a new distribution of smaller voids, 
which can be regarded as an asymmetric kernel, by which the 
actual noise-free distribution n should be convolved. As the 
figure shows, when the threshold is chosen to be lower than 
2.5a (or at ^ 75% of the continuum in our case), the noise 
has little effect on the void size distribution. Finally, we note 
a complication that arises during void mixing. When noise is 
ignored, any ionization void can connect with density voids on 
both sides. But if an ionization void divided, the sub-voids can 
only connect on one side (if the sub-void is on the edge of the 
original void) or neither side (if the sub-void is flanked on both 
sides by noise spikes). In our numerical calculation of the noisy 
void-size distribution, we kept track of these different types of 
voids, and treated them accordingly. This entails modifying 
equations (fT2l i and (fTsT l. which describe only the case when 
ionization voids connect on both sides with density voids; in 
the interest of brevity, we do not list these modified equations 
here. 

5. RESULTS 

We first list, in Table 1, the result of the exponential fit. As 
we can see, the \^ likelihoods in this Table, for all five choices 
of the threshold, are 20 percent. This means that the expo- 
nential fits are acceptable, and there is no statistical evidence for 
hot bubbles, or for any voids beyond those found in the usual 
"fluctuating Gunn-Peterson absorption" picture for the Lyman 
a forest. 



Table 1 

Best fitting exponential distributions of the form 
Aexp[-feAA] (I.E. models of the Lyman a forest without 

ANY HOT BUBBLES), TO THE OBSERVED HISTOGRAM OF VOID 

SIZES. Five different thresholds are considered for 

DEFINING VOIDS. 



Threshold 


A 


b 


x' 


Likelihood 


80% 


0.0192 


0.198 


1.093 


0.198 


75% 


0.0164 


0.173 


0.437 


0.823 


70% 


0.0136 


0.150 


0.529 


0.757 


65% 


0.0114 


0.132 


0.812 


0.481 


60% 


0.0092 


0.114 


0.762 


0.539 



Table 2 

Upper limits on Q and on the corresponding volume 
filling factor q of hot bubbles, in the model for the 
Lyman q forest that includes such hot bubbles. The fit 
IS considered unacceptable when the likelihood drops 

below W^. For comparison, we also calculated the 
upper limits of ^ and q with less stringent likelihood 

thresholds of 10"^, and also when the formula in 
equation (ft4l l is cut at the first order (the probability 
threshold is 10^^). these results are listed in the first 
and second columns within parentheses, respectively. 



Threshold 


c 


Volume filling factor Q 


80% 


5.70 (5.64,5.72) 


0.226 (0.221,0.227) 


75% 


5.14(4.94,5.44) 


0.185 (0.172,0.206) 


70% 


4.89 (4.69,5.31) 


0.169 (0.156,0.197) 


65% 


4.94 (4.72,5.42) 


0.172 (0.158,0.205) 


60% 


5.77 (5.61,6.35) 


0.231 (0.219,0.281) 



Next, we constrain the abundance of hot bubbles using our 
fitting formula in equation ( fT4b . modified to include the effects 
of a constant Gaussian noise as discussed in the preceding sec- 
tion. Table 2 gives the maximum values of and the corre- 
sponding maximum allowed hot bubble volume filling factors 
(0, with likelihoods at 10"^. The tightest constraint we 
find is for a threshold of 70%, in which case the volume filling 
factor of hot bubbles is at most 16.9%. Interestingly, the con- 
straint does not vary monotonically. As the threshold is lowered 
below 70%, the number of large voids obtained from the spec- 
tra increases, which weakens our constraints. As the threshold 
is increased above 70%, our constraint again becomes weaker, 
because spectral noise in this case can eliminate the large voids 
that the models would otherwise predict. 

In order to investigate the importance of the likelihood thresh- 
old, in Table 2 we list within parentheses (first column) the up- 
per limit on C, when the probability threshold is increased to 
10"^. As we can see, the results change only a little. This is 
not surprising: we find that probabilities fall very rapidly with 
increasing C, once they reach a level below 10"^. In Figure|3] we 
show two explicit void-size histograms used in our fitting pro- 
cedure. The threshold is chosen to be 70% in the lower panel, 
and 75% in the upper panel; the C's are at their upper limits 
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Fig. 3. — Results of fitting model void size histograms to the data. The 
threshold for defining a void is 75% (70%) of the unabsorbed continuum in 
the upper (lower) panel. The crosses show the data points inferred from 137 
quasar s pect ra; the dashed curves show the best fitting results of our fitting 
formula 1141 when is at its upper limit (i.e. with the largest allowed fining 
factor of hot bubbles), while the dotted curves correspond to the exponential 
fits (i.e. models without hot bubbles). We also show the contribution from each 
void-size bin to the total ■ Models with large filling factors can be ruled out 
primarily because they over-produce the number of 40 — 50A voids; they also 
under-produce the number of 10 — 15A voids. 



listed in Table 2. In the same figure, we also plot the contri- 
bution from each void-size bin to the total x^- As we can see, 
the hot bubbles not only increase the number of large voids in 
the Lyman-a forest, but also disturb the distribution of smaller 
voids. In other words, the primary reason we can rule out these 
models with large bubble filling factors is that they overpro- 
duce the number of 40-50A voids; however, they also signif- 
icantly underproduce the number of smaller, 10- 15A voids. 
We find that the free choice of A always assures a good fit to 
the size-distribution of the intermediate size voids, in the range 
15-40A. 

To investigate the importance of the assumed value of Mm,„, 
in the range motivated in the Introduction, we replace our fidu- 
cial value of M,„,„ = 1O"M0 by lO'', 10"' and IO'^Mq, and 
repeat our analysis in each case. The results are listed in Table 
[5] The upper limits of ( generally increase as M,„,„ increases. 
Physically, this is because a single collapsed clump is allowed 
to ionize a larger region, when the total number of such ioniz- 
ing clumps is reduced, due to an increase in M„„„. The volume 
filling factors, on the other hand, decrease with increasing M„„„ . 
This is because low-mass halos tend to produce small ionized 
regions, which are too small to effectively disturb the exponen- 
tial void distribution at sizes above 5A, but these small bubbles 
still contribute to the volume filling factor. The increase in M„„„ 
eliminates these smaller ionized regions, and such models are 
therefore easier to constrain. For example, forMmm = lO'^M©, 
we find a relatively tight limit of Q <, 11%. 

6. DISCUSSION 

It is interesting to ask whether the constraints we obtained 
above, using 137 quasars, could become significantly tighter by 
simply increasing the number of the spectra. The SDSS data- 
base (up to DR4) contains approximately 30,000 quasar spectra 
at redshifts z > 2.3. At a fixed value of A and b, will increase 
roughly in proportion to the number of quasar spectra. To quan- 
tify the effect of this increase on the upper limit on Q, we gen- 
erated mock void-size histograms implementing realizations of 
the exact exponential void distribution with A and b = chosen to 
be the best-fit values from Table 1 . As a test of the method, we 
first generated histograms corresponding to 137 quasar spectra. 
When fitting our model to these mock data, we found an upper 
limit Q < 16.9% at the threshold of 70%, in agreement with the 
results in Table 2 using the actual spectra. Next, we generated 
mock histograms for a hypothetical 13,700 quasar spectra. We 
found that this 100-fold increase in the number of quasars im- 
proved the upper limit on the volume filling factor toQ < 6.6%. 

Another way to improve the sensitivity of the constraints 
would be to use spectra with higher signal-to-noise ratio and/or 
with higher resolution. To illustrate the impact of noise, we 
assumed noise is negligible, and repeated our analysis for the 
80% threshold case. We found that the upper limits on ^ and 
Q improve from (5.70, 0.226) to (4.67,0. 155), respectively. We 
expect the limits to tighten further if we raise our threshold, as 
would be possible if the spectral noise was indeed very small. 
We expect the main improvement allowed by higher resolution 
spectra is that we could utilize the void-size distribution down 
to smaller sizes, below 5A. This would be significantly more 
than just an additional bin of data. This is because as ( is de- 
creased below some value, the typical bubble radius produced 
around a single collapsed halo will become smaller than the 
mean distance between collapsed halos. In this case, there will 
be very httle overlap between different hot bubbles, and the ion- 
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Table 3 

Upper limits on ^ and on the volume filling factors of hot bubbles, when the minimum halo mass to produce a 

BUBBLE, Mmiiu IS ASSUMED TO BE lO', lO'", lO" AND lO'^M©. 





IO'^Mq 


lO'^M© 


lO'^M© 


lO'^M© 


Threshold 


c 


Q 


c 


Q 


c 


Q 


c 


Q 


80% 


2.39 


0.314 


3.28 


0.278 


5.70 


0.226 


16.50 


0.146 


75% 


2.27 


0.271 


3.05 


0.234 


5.14 


0.185 


14.55 


0.118 


70% 


2.20 


0.249 


2.94 


0.215 


4.89 


0.169 


13.76 


0.108 


65% 


2.21 


0.252 


2.95 


0.217 


4.94 


0.172 


13.97 


0.111 


60% 


2.38 


0.310 


3.27 


0.276 


5.77 


0.231 


18.06 


0.170 



ization voids will become small. Constraints on C, and the filling 
factor of such small voids would only be possible in higher res- 
olution spectra. The correlation between Lyman-a absorption 
lines can no longer be ignored on scales smaller than we uti- 
lized here, and would have to be modeled in analyzing higher- 
resolution spectra. 

The main virtue of our model is that it takes bubble merg- 
ers into account. Nevertheless, it is based on assumptions that 
are likely to be oversimplifications. First, we treated ^ as a 
constant, while it is possible that it could be strongly depen- 
dent on the mass and the environment of the collapsed halo, 
and could also evolve with redshift. For example, the star- 
formation rate may scale roughly linearly with the mass of the 
gas reservoir, and hence with the halo mass. However, the 
gravitational binding energy per unit mass in a halo of mass M 
scales as kT cx M'^l^il making it more difficult for winds 
to escape from larger halos. Furthermore, in its original context 
of reionization, the bubble-merger model we adopted here was 
motivated by the reasonable assumption that the merger of two 
photoionized bubbles conserves the total ionized mass. If the 
hot bubbles are produced by overlapping galactic winds (rather 
than photoionization), then this assumption is likely to be much 
less accurate. When two winds overlap, they will interact dy- 
namically, and the winds will not instantanously propagate to 
the edge of the joint bubble, to conserve mass. As a result, it is 
likely that the effective value of C, will further decrease as the 
overlap between winds becomes more significant. It would be 
possible to incorporate an M and z-dependence, C = C,{M,z), in 
our modeling, as well as a further decrease in C, that depends on 
the number of galaxies per bubble, but we leave such improve- 
ments to future work. 

Our modeling also assumes spherical "hollow" bubbles, and 
ignores their inner structures. It is possible for bubbles to be 
quite non-spherical; this would be similar to smoothing the 
size-distribution with an appropriate scatter, representing the 
dispersion in radial extent when the line of sight crosses a bub- 
ble in different directions. The impact of such a scatter would 
be to increase the number of large voids. Since the upper limit 
we found is driven by the predicted number of such large voids 
(Fig. [3j, these upper limits should be improved if the scatter 
due to non-sphericity was included. On the other hand, it is 
also possible that there is residual neutral hydrogen within the 
hot bubbles, so the voids are not completely empty, either due 
to incomplete mixing between hot wind material and the ambi- 
ent IGM, or due to radiative transfer effects (if bubble heating 
is due to photo-ionization). Some numerical simulations also 



indicate that the galactic winds tend to expand preferentially to- 
wards lower density regions and leave the relatively more over- 
dense filaments, which produce the deeper Lyman a absorption 
lines, intact (e.g. Theuns et al. 2002; Bruscoli et al. 2003; 
McDonald et al. 2005). In particular, Theuns et al. (2002) ex- 
plicitly show that in their models for galactic winds, the winds 
produce no discernible effect on the column density distribu- 
tion of absorption lines even down to column densities below 
10^^ cm~^. In this case, the winds would produce very few, if 
any, new voids in Lyman a spectra, even at stringent thresholds. 
This conclusion, however, may not be generic - it must depend 
on the spatial distribution of sources and the nature and geom- 
etry of winds, as well as on the filling factor of winds (e.g. we 
expect winds to ultimately penetrate the denser regions, if their 
filling factor is high). 

Our analysis also neglects the correlations both among Lyman- 
a Unes and also between Lyman-a lines and hot bubbles. Both 
deep Lyman-a absorption lines and hot bubbles are inclined 
to appear at overdensity regions, so there should be a positive 
correlation between these two, which should be taken into ac- 
count in future modeling. Finally, in our fitting procedure, we 
omit terms above second order in equation ( fT4l l. Intuitively, 
one expects that higher order terms tend to produce even larger 
voids, and, as Figure [3] shows, it is these large predicted voids 
that yield our constraints. This expectation is borne out in Ta- 
ble |2] where we list the upper limits on C when only the first 
order term is retained (i.e. we use no + ni in equation [T4li. The 
Table shows that omitting the second order terms weaken the 
constraints. 

Finally, we note that the three-year data from WMAP fa- 
vors a lower value for the power spectrum normalization than 
the fiducial = 0.9 we adopted. We have explicitly verified, 
however, that this choice has no significant effect on our con- 
clusions. In particular, we repeated the calculations from Table 
2, with all parameters left unchanged, except replacing cg = 0.9 
by (Tg = 0.75. We found that this changes the upper limits on 
the filling factor Q by less than 3 percent - although the corre- 
sponding values of the efficiency C, are increased by a factor of 
^ two. This latter change is to be expected - the reduction in the 
underlying dark matter halo abundance implies that producing 
the same filling factor requires a higher efficiency. 

7. CONCLUSIONS 

Motivated by the empirical evidence for significant "pre- 
heating" of at least parts of the IGM at z 3, we made a sim- 
ple model for the spatial distribution of preheated regions. The 
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model assumes spherical hot bubbles around collapsed dark 
matter halos, and allows these spheres to merge into larger "super- 
bubbles". We predicted the number of voids that such hot bub- 
bles would produce in Lyman a absorption spectra of back- 
ground quasars. 

Our comparison with the observed spectra of 137 quasars did 
not uncover evidence for hot bubbles at z ~ 3. Instead, we 
found an upper limit on the volume-filling factor of hot bub- 
bles, ranging from 11-25%, depending on the assumed size of 
the smallest halo that produces hot bubbles. This is compa- 
rable to the the fraction of the total mass in the present-day 
universe in low-mass clusters and groups, suggesting that the 
pre-heating at z ^ 3 may not have affected all the gas currently 
residing in these objects. 

These constraints are complementary to studies of the impact 
of galactic winds on Lyman a absorption spectra in the vicin- 
ity of known galaxies (LBGs). The latter approach is a more 
sensitive probe of the effects of the LBGs themselves, whereas 
searching the "global" statistics is sensitive to feedback from 
undetected galaxies whose spatial distribution is not strongly 
correlated with LBGs. 

While the constraints we obtain are still relatively weak, they 
suggest that pre-heating, if it occurred, avoided heating the 
low-density gas in the proto-cluster regions, either by oper- 
ating relatively recently (z ^ 3) or by depositing entropy pref- 
erentially in the high-density regions. We expect that our con- 
straints could be improved significantly by analysing a larger 
number of quasars spectra, and by improving on the simple 
model presented here. 
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